Thursday, September 21, 2022

Today

  • Making maps

Making maps using functions from the {tmap} package

The {tmap} package has functions for creating thematic maps. The syntax is like the syntax of the functions in {ggplot2}. The functions work with a variety of spatial data.

Consider the simple feature data frame called World from the {tmap} package.

library(tmap)

data("World")
str(World)
## Classes 'sf' and 'data.frame':   177 obs. of  16 variables:
##  $ iso_a3      : Factor w/ 177 levels "AFG","AGO","ALB",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ name        : Factor w/ 177 levels "Afghanistan",..: 1 4 2 166 6 7 5 56 8 9 ...
##  $ sovereignt  : Factor w/ 171 levels "Afghanistan",..: 1 4 2 159 6 7 5 52 8 9 ...
##  $ continent   : Factor w/ 8 levels "Africa","Antarctica",..: 3 1 4 3 8 3 2 7 6 4 ...
##  $ area        : Units: [km^2] num  652860 1246700 27400 71252 2736690 ...
##  $ pop_est     : num  28400000 12799293 3639453 4798491 40913584 ...
##  $ pop_est_dens: num  43.5 10.3 132.8 67.3 15 ...
##  $ economy     : Factor w/ 7 levels "1. Developed region: G7",..: 7 7 6 6 5 6 6 6 2 2 ...
##  $ income_grp  : Factor w/ 5 levels "1. High income: OECD",..: 5 3 4 2 3 4 2 2 1 1 ...
##  $ gdp_cap_est : num  784 8618 5993 38408 14027 ...
##  $ life_exp    : num  59.7 NA 77.3 NA 75.9 ...
##  $ well_being  : num  3.8 NA 5.5 NA 6.5 4.3 NA NA 7.2 7.4 ...
##  $ footprint   : num  0.79 NA 2.21 NA 3.14 2.23 NA NA 9.31 6.06 ...
##  $ inequality  : num  0.427 NA 0.165 NA 0.164 ...
##  $ HPI         : num  20.2 NA 36.8 NA 35.2 ...
##  $ geometry    :sfc_MULTIPOLYGON of length 177; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:69, 1:2] 61.2 62.2 63 63.2 64 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:15] "iso_a3" "name" "sovereignt" "continent" ...

The spatial data frame contains socioeconomic indicators from 177 countries around the world. Each row is one country’s indicators.

You make a map by first specifying the spatial data frame using the tm_shape() function and then you add a layer consistent with the geometry.

For example, if you want a map showing the index of happiness (column name HPI) by country, use the tm_shape() function to identify the spatial data frame World then add a fill layer with the tm_polygons() function.

The fill is specified by the argument col = indicating the specific column from the data frame. Here use HPI.

tm_shape(shp = World) +
    tm_polygons(col = "HPI")

The tm_polygons() function with the argument col = colors the countries based on the values in the column HPI of the World data frame.

Map layers are added with the + operator.

Caution: the column in the data frame World must be specified using quotes "HPI". This is different from the functions in the {ggplot2} package.

To show two thematic maps together each with a different variable, specify col = c("HPI", "well_being")

The tm_polygons() function splits the values in the specified column into meaningful groups (here 8) and countries with missing values (NA) values are colored gray.

More (or fewer) intervals can be specified with the n = argument, but the cutoff values are chosen at appropriate places.

Example: Mapping tornadoes

Consider the tornado data from the U.S. Storm Prediction Center (SPC). It is downloaded as a shapefile in the directory data/1950-2018-torn-aspath.

A shapefile is imported with the sf::st_read() function from the {sf} package.

Tornadoes.sf <- sf::st_read(dsn = "data/1950-2018-torn-aspath")
## Reading layer `1950-2018-torn-aspath' from data source 
##   `/Users/jelsner/Desktop/Projects/QG-2022/data/1950-2018-torn-aspath' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 63645 features and 22 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -163.53 ymin: 18.13 xmax: -64.9 ymax: 61.02
## Geodetic CRS:  WGS 84

The assigned file is a simple feature data frame with 63645 features (observations) and 23 fields (variables).

Each row (observation) is a unique tornado.

Look inside the simple feature data frame with the glimpse() function from the {dplyr} package.

dplyr::glimpse(Tornadoes.sf)
## Rows: 63,645
## Columns: 23
## $ om       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ yr       <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
## $ mo       <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ dy       <dbl> 3, 3, 3, 13, 25, 25, 26, 11, 11, 11, 11, 12, 12, 12, 12, 12, …
## $ date     <chr> "1950-01-03", "1950-01-03", "1950-01-03", "1950-01-13", "1950…
## $ time     <chr> "11:00:00", "11:55:00", "16:00:00", "05:25:00", "19:30:00", "…
## $ tz       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ st       <chr> "MO", "IL", "OH", "AR", "MO", "IL", "TX", "TX", "TX", "TX", "…
## $ stf      <dbl> 29, 17, 39, 5, 29, 17, 48, 48, 48, 48, 48, 48, 48, 48, 48, 28…
## $ stn      <dbl> 1, 2, 1, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 10, 2, 1, …
## $ mag      <dbl> 3, 3, 1, 3, 2, 2, 2, 2, 3, 2, 2, 2, 1, 2, 1, 2, 1, 3, 2, 4, 2…
## $ inj      <dbl> 3, 3, 1, 1, 5, 0, 2, 0, 12, 5, 6, 8, 0, 0, 32, 2, 0, 15, 0, 7…
## $ fat      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 3, 0, 3, 0, 18, …
## $ loss     <dbl> 6, 5, 4, 3, 5, 5, 0, 4, 4, 5, 5, 4, 4, 4, 5, 4, 0, 5, 3, 5, 5…
## $ closs    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ slat     <dbl> 38.77, 39.10, 40.88, 34.40, 37.60, 41.17, 26.88, 29.42, 29.67…
## $ slon     <dbl> -90.22, -89.30, -84.58, -94.37, -90.68, -87.33, -98.12, -95.2…
## $ elat     <dbl> 38.8300, 39.1200, 40.8801, 34.4001, 37.6300, 41.1701, 26.8800…
## $ elon     <dbl> -90.0300, -89.2300, -84.5799, -94.3699, -90.6500, -87.3299, -…
## $ len      <dbl> 9.5, 3.6, 0.1, 0.6, 2.3, 0.1, 4.7, 9.9, 12.0, 4.6, 4.5, 8.0, …
## $ wid      <dbl> 150, 130, 10, 17, 300, 100, 133, 400, 1000, 100, 67, 833, 233…
## $ fc       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ geometry <LINESTRING [°]> LINESTRING (-90.22 38.77, -..., LINESTRING (-89.3 …

The first 22 columns are variables (attributes). The last column contains the geometry. Information in the geometry column is in well-known text (WKT) format.

Each tornado is a coded as a LINESTRING with a start and end location. This is where the tm_shape() function looks for the geographic information.

Here you make a map showing the tracks of all the tornadoes since 2011. First filter the data frame keeping only tornadoes occurring after the year (yr) 2010.

TornadoesSince2011.sf <- 
  Tornadoes.sf |>
  dplyr::filter(yr >= 2011) 

Next get a file containing the boundaries of the lower 48 states.

USA_48.sf <- USAboundaries::us_states() |>
   dplyr::filter(!state_name %in% c("Hawaii", "Alaska", "Puerto Rico"))

Then use the tm_shape() function together with the tm_borders() layer to draw the boundaries before adding the tornadoes. The tornadoes are in a separate spatial data frame so you use the tm_shape() function together with the tm_lines() layer.

tm_shape(shp = USA_48.sf, projection = 5070) +
  tm_borders() +
tm_shape(shp = TornadoesSince2011.sf) +
    tm_lines(col = "red")

The objects named TornadoesSince2011.sf and USA_48.sf are simple feature data frames. You map variables in the data frames as layers with successive calls to the tm_shape() function.

The default projection is geographic (latitude-longitude) which is changed using the projection = argument and specifying a EPSG number (or proj4 string). Here you use 5070 corresponding to USA Contiguous Albers Equal Area Conic, USGS (EPSG = 5070 or 102003).

You make the map interactive by first turning on the "view" mode with the tmap_mode() function before running the code.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(USA_48.sf) +
  tm_borders() +
tm_shape(TornadoesSince2011.sf) +
    tm_lines(col = "red")

You can now zoom, pan, and change the background layers.

Switch back to plot mode by typing.

tmap_mode("plot")
## tmap mode set to plotting

Example: Mapping the frequency of tornadoes by state

Suppose you want to show the number of tornadoes originating in each state on a map. You first need to prepare the data.

You do this with a series of then statements connected by pipes (|>). Start by assigning to the object TornadoeCountsByState.df the contents of Tornadoes.sf then remove the the geometry column, then remove states outside lower 48 using the dplyr::filter() function, then group by state, then summarize creating a colunm called nT that keeps track of the number of rows (dplyr::n()), then change the column name of st to state_abbr to match the state name abbreviation in the USA_48.sf data frame.

TornadoCountsByState.df <- Tornadoes.sf |>
  sf::st_drop_geometry() |>
  dplyr::filter(st != "PR" & st != "HI" & st != "AK") |>
  dplyr::group_by(st) |>
  dplyr::summarize(nT = dplyr::n()) |>
  dplyr::rename(state_abbr = st)

dplyr::glimpse(TornadoCountsByState.df)
## Rows: 49
## Columns: 2
## $ state_abbr <chr> "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA",…
## $ nT         <int> 2143, 1809, 250, 436, 2174, 104, 2, 61, 3381, 1652, 2570, 2…

The resulting data frame contains the grouped-by column state_abbr (origin state) and the corresponding number of tornadoes. There were 459 tornadoes in Alabama since 2011, 255 in Arkansas, etc.

Next you need to join the new data frame with the spatial data frame. You join the TornadoCountsByState.df data frame with the map simple feature data frame USA_48.sf using the dplyr::left_join() function and recycling the name.

USA_48.sf <-dplyr::left_join(USA_48.sf,
                             TornadoCountsByState.df,
                             by = "state_abbr") 

names(USA_48.sf)
##  [1] "statefp"           "statens"           "affgeoid"         
##  [4] "geoid"             "stusps"            "name"             
##  [7] "lsad"              "aland"             "awater"           
## [10] "state_name"        "state_abbr"        "jurisdiction_type"
## [13] "nT"                "geometry"

Notice that you now have a new column in the spatial data frame USA_48.sf named nT that contains the number of tornadoes in that state.

Next you create a draft map to see if things look correct.

tm_shape(shp = USA_48.sf, projection = 5070) +
  tm_polygons(col = "nT", 
           title = "Tornado Counts",
           palette = "Oranges")

Tornadoes are most common in the southern Great Plains into the Southeast.

You improve the defaults with additional layers including text, compass, and scale bar. The last layer is the print view.

tm_shape(shp = USA_48.sf, projection = 5070) +
  tm_polygons(col = "nT", 
              border.col = "gray70",
              title = "Tornado Counts",
              palette = "Oranges") +
  tm_text("nT", size = .5) +
  tm_compass() + 
  tm_scale_bar(lwd = .5)

The format of the {tmap} objects (meoms) are like those of the {ggplot2} geometric objects (geoms) making it easy to quickly map your data. Fine details are worked out in production.

More information?

Calculations using the geometry simple feature column

Spatial data analysis often requires calculations on the geometry. Two of the most common are computing centroids (geographic centers) and buffers.

Geometry calculations should be done on projected coordinates. To see what CRS the simple feature data frame has use st_crs().

sf::st_crs(USA_48.sf)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Note the length unit (LENGTHUNIT[]) is meter.

Here transform the CRS of the USA_48.sf simple feature data frame to a U.S. National Atlas equal area (EPSG: 2163) and then check it.

USA_48.sf <- USA_48.sf |>
  sf::st_transform(crs = 2163)

sf::st_crs(USA_48.sf)
## Coordinate Reference System:
##   User input: EPSG:2163 
##   wkt:
## PROJCRS["NAD27 / US National Atlas Equal Area",
##     BASEGEOGCRS["NAD27",
##         DATUM["North American Datum 1927",
##             ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4267]],
##     CONVERSION["US National Atlas Equal Area",
##         METHOD["Lambert Azimuthal Equal Area (Spherical)",
##             ID["EPSG",1027]],
##         PARAMETER["Latitude of natural origin",45,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-100,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Statistical analysis."],
##         AREA["United States (USA) - onshore and offshore."],
##         BBOX[15.56,167.65,74.71,-65.69]],
##     ID["EPSG",9311]]

The centroid calculation locates the center of geographic objects representing the center of mass for the spatial object (think of balancing a plate on your finger).

You calculate the geographic centroid of each of the lower 48 states with the st_centroid() function.

geo_centroid.sf <- sf::st_centroid(USA_48.sf)
## Warning in st_centroid.sf(USA_48.sf): st_centroid assumes attributes are
## constant over geometries of x

The result is a simple feature data frame where the geometry is a single point for each state. You keep track of the fact that this is a simple feature data frame by using an object name that includes appends with .sf.

The warning tells you that the attributes in the new simple feature data frame may not make sense with the new geometry.

For example, compare the first two rows of the two simple feature data frames.

head(geo_centroid.sf, n = 2)
## Simple feature collection with 2 features and 13 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -1711894 ymin: -666720.2 xmax: 791612.5 ymax: 7216.145
## Projected CRS: NAD27 / US National Atlas Equal Area
##   statefp  statens    affgeoid geoid stusps       name lsad        aland
## 1      06 01779778 0400000US06    06     CA California   00 403671196038
## 2      55 01779806 0400000US55    55     WI  Wisconsin   00 140292246684
##        awater state_name state_abbr jurisdiction_type   nT
## 1 20294133830 California         CA             state  436
## 2 29343721650  Wisconsin         WI             state 1380
##                     geometry
## 1 POINT (-1711894 -666720.2)
## 2  POINT (791612.5 7216.145)
head(USA_48.sf, n = 2)
## Simple feature collection with 2 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -2036903 ymin: -1242190 xmax: 1027143 ymax: 269562.7
## Projected CRS: NAD27 / US National Atlas Equal Area
##   statefp  statens    affgeoid geoid stusps       name lsad        aland
## 1      06 01779778 0400000US06    06     CA California   00 403671196038
## 2      55 01779806 0400000US55    55     WI  Wisconsin   00 140292246684
##        awater state_name state_abbr jurisdiction_type   nT
## 1 20294133830 California         CA             state  436
## 2 29343721650  Wisconsin         WI             state 1380
##                         geometry
## 1 MULTIPOLYGON (((-1719948 -1...
## 2 MULTIPOLYGON (((1017108 129...

The land area (aland) makes sense when the geometry is MULTIPOLYGON it is less congruent when the geometry is POINT.

You map the points using the tm_dots() function after first mapping the state borders.

tm_shape(shp = USA_48.sf) +
  tm_borders(col = "gray70") +
tm_shape(shp = geo_centroid.sf) +
  tm_dots(size = 1,
          col = "black")

Buffers are polygons representing the area within a given distance of a geometric feature. Regardless of whether the feature is a point, a line, or a polygon.

The function sf::st_buffer() computes the buffer and you set the distance with the dist = argument. Here you create a new simple feature data frame with only the state of Florida.

You then compute a 50 km (50,000 meters) buffer and save the resulting polygon

FL.sf <- USA_48.sf |>
           dplyr::filter(state_abbr == "FL")

FL_buffer.sf <- sf::st_buffer(FL.sf, 
                              dist = 50000)

Create a map containing the state border, the 50 km buffer, and the centroid. Include a compass arrow and a scale bar.

tm_shape(FL_buffer.sf) +
  tm_borders(col = "gray70") +
tm_shape(FL.sf) +
  tm_borders() +
tm_shape(geo_centroid.sf) +
  tm_dots(size = 2) +
tm_compass(position = c("left", "bottom")) + 
tm_scale_bar(text.size = 1, position = c("left", "bottom"))

The result is a map that could serve as a map of your study area (usually Figure 1 in scientific report).

Making raster maps

The package {ggmap} retrieves raster map tiles (groups of pixels) from services like Google Maps and plots them using the {ggplot2} grammar.

Map tiles are rasters as static image files generated by the mapping service. You do not need data files containing information on things like scale, projection, boundaries, etc. because that information is created by the map tile.

This limits the ability to redraw or change the appearance of the map but it allows for easy overlays of data onto the map.

Get map images using functions from the {ggmap} package

You get map tiles with the ggmap::get_map() function from the {ggmap} package. You specify the bounding box (or the center and zoom). The bounding box requires the left-bottom and right-top corners of the region specified as longitude and latitude in decimal degrees.

For instance, to obtain a map of Tallahassee from the stamen mapping service you first set the bounding box (left-bottom corner as -84.41, 30.37 and right-top corner as -84.19, 30.55) then use the ggmap::get_stamenmap() function with a zoom level of 12.

library(ggmap)

TLH_bb <- c(left = -84.41,
            bottom = 30.37,
            right = -84.19,
            top = 30.55)

TLH_map <- ggmap::get_stamenmap(bbox = TLH_bb,
                                zoom = 12)
TLH_map
## 609x641 terrain map image from Stamen Maps. 
## See ?ggmap to plot it.

The saved object (TLH_map) is a raster map specified by the class ggmap.

To view the map, use ggmap() function.

ggmap(TLH_map)

The zoom = argument in the get_stamenmap() function controls the level of detail. The larger the number, the greater the detail.

Trial and error helps you decide on the appropriate level of detail depending on the data you need to visualize. Use boxfinder to determine the exact longitude/latitude coordinates for the bounding box you wish to obtain.

Or you can use the tmaptools::geocode_OSM() function from the {tmaptools} package. We first specify a location then get a geocoded coordinate.

FSU.list <- tmaptools::geocode_OSM("Florida State University")
FSU.list
## $query
## [1] "Florida State University"
## 
## $coords
##         x         y 
## -84.29748  30.44236 
## 
## $bbox
##      xmin      ymin      xmax      ymax 
## -84.30650  30.43563 -84.28846  30.44907

The object FSU.list is a list containing three elements query, coords and bbox. You are interested in the bbox element so you save that as vector that you assign FSU_bb and rename the elements to left, bottom, right, and top.

FSU_bb <- FSU.list$bbox
names(FSU_bb) <- c("left", "bottom", 
                   "right", "top")
FSU_bb
##      left    bottom     right       top 
## -84.30650  30.43563 -84.28846  30.44907

You then get the map tiles corresponding to the bounding box from the stamen map service with a zoom of 16 and create the map.

FSU_map <- ggmap::get_stamenmap(bbox = FSU_bb, 
                                zoom = 16)
ggmap(FSU_map)

Add data to the raster map

Let’s consider a map of Chicago.

CHI_bb <- c(left = -87.936287,
            bottom = 41.679835,
            right = -87.447052,
            top = 42.000835)

CHI_map <- get_stamenmap(bbox = CHI_bb,
                         zoom = 11,
                         messaging = FALSE)
ggmap(CHI_map)

The city of Chicago has a data portal publishing a large volume of public records. Here we look at crime data from 2017. The file car_thefts.csv is a spreadsheet obtained from that portal with a list of car thefts.

You read these data using the readr::read_csv() function.

carTheft <- readr::read_csv(file = "data/car_thefts.csv")
## New names:
## * `` -> ...1
## Rows: 11416 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Case.Number, Date, Block, IUCR, Primary.Type, Description, Locatio...
## dbl (11): ...1, ID, Beat, District, Ward, Community.Area, X.Coordinate, Y.Co...
## lgl  (2): Arrest, Domestic
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(carTheft)
## # A tibble: 6 × 23
##    ...1       ID Case.Number Date           Block IUCR  Primary.Type Description
##   <dbl>    <dbl> <chr>       <chr>          <chr> <chr> <chr>        <chr>      
## 1     1 10810096 JA107583    01/07/2017 05… 004X… 0910  MOTOR VEHIC… AUTOMOBILE 
## 2     2 10810539 JA108792    01/08/2017 09… 032X… 0930  MOTOR VEHIC… THEFT/RECO…
## 3     3 10811381 JA110666    01/08/2017 07… 011X… 0910  MOTOR VEHIC… AUTOMOBILE 
## 4     4 10811599 JA109921    01/09/2017 04… 061X… 0910  MOTOR VEHIC… AUTOMOBILE 
## 5     5 10811645 JA110998    01/10/2017 07… 049X… 0910  MOTOR VEHIC… AUTOMOBILE 
## 6     6 10811674 JA111011    01/10/2017 05… 037X… 0910  MOTOR VEHIC… AUTOMOBILE 
## # … with 15 more variables: Location.Description <chr>, Arrest <lgl>,
## #   Domestic <lgl>, Beat <dbl>, District <dbl>, Ward <dbl>,
## #   Community.Area <dbl>, FBI.Code <chr>, X.Coordinate <dbl>,
## #   Y.Coordinate <dbl>, Year <dbl>, Updated.On <chr>, Latitude <dbl>,
## #   Longitude <dbl>, Location <chr>

Each row of the data frame is a single report of a vehicle theft. Location is encoded in several ways, though most importantly for us the longitude and latitude of the theft is encoded in the Longitude and Latitude columns, respectively.

You use the geom_point() function to map the location of every theft. Because ggmap() uses the map tiles (here, defined by CHI_map) as the first layer, you specify data and mapping inside of geom_point().

ggmap(CHI_map) +
  geom_point(data = carTheft,
             mapping = aes(x = Longitude,
                           y = Latitude),
             size = .25,
             alpha = .1)
## Warning: Removed 425 rows containing missing values (geom_point).

Note ggmap() replaces ggplot().

Spatial density maps (extra material)

Instead of relying on geom_point() and plotting the raw data, another approach is to create a heat map. This is done with a density estimator. Since the map has two dimensions and the density estimator requires a ‘kernel’ function the procedure is called a 2-D kernel density estimation (KDE).

KDE will take all the data (i.e. reported vehicle thefts) and convert it into a smoothed plot showing geographic concentrations of crime. KDE is a type of data smoothing where inferences about the population are made based on a finite data sample.

The core function in {ggplot2} to generate this kind of plot is geom_density_2d().

ggmap(CHI_map) +
  geom_density_2d(data = carTheft,
                  aes(x = Longitude,
                      y = Latitude))
## Warning: Removed 425 rows containing non-finite values (stat_density2d).

By default, geom_density_2d() draws a contour plot with lines of constant value. That is, each line represents approximately the same frequency of crime along that specific line. Contour plots are often used in maps (known as topographic maps) to denote elevation.

Rather than drawing lines you fill in the graph by using the fill aesthetic to draw bands of crime density. To do that, you use the related function stat_density_2d().

ggmap(CHI_map) +
  stat_density_2d(data = carTheft,
                  aes(x = Longitude,
                      y = Latitude,
                      fill = stat(level)),
                  geom = "polygon")
## Warning: Removed 425 rows containing non-finite values (stat_density2d).

Note the two new arguments:

  • geom = "polygon" - change the geometric object to be drawn from a geom_density_2d() geom to a polygon geom
  • fill = stat(level) - the value for the fill aesthetic is the level calculated within stat_density_2d(), which you access using the stat() notation.

This is an improvement, but you can adjust some settings to make the graph visually more useful. Specifically,

  • Increase the number of bins, or unique bands of color allowed on the graph
  • Make the colors semi-transparent using alpha so you can still view the underlying map
  • Change the color palette to better distinguish between high and low crime areas.

Here you use RColorBrewer::brewer.pal() from the {RColorBrewer} package to create a custom color palette using reds and yellows.

ggmap(CHI_map) +
  stat_density_2d(data = carTheft,
                  aes(x = Longitude,
                      y = Latitude,
                      fill = stat(level)),
                  alpha = .2,
                  bins = 25,
                  geom = "polygon") +
  scale_fill_gradientn(colors = RColorBrewer::brewer.pal(7, "YlOrRd"))
## Warning: Removed 425 rows containing non-finite values (stat_density2d).

The downtown region has the highest rate of vehicle theft. Not surprising given its population density during the workday. There are also clusters of vehicle thefts on the south and west sides.